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ABSTRACT 



A lower bound of cNlogN is proved for the mean time complexity of 
an on-line multitape Turing machine performing the multiplication 
of N-digit binary integers. For a more general class of machines 
which includes some models of random-access machines, the corres- 
ponding bound is cNlogN/loglogN. These bounds compare favorably 

k 
with known upper bounds of the form cN(logN) , and for some 

classes the upper and lower bounds coincide. The proofs are 

based on the "overlap" argument due to Cook and Aanderaa. 
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1. INTRODUCTION 

A challenging problem in the field of computational complexity is 
to prove lower bounds on the computing time for naturally defined 
algorithms executed by realistically powerful machinery. For a serial 
machine whose task is to map an input string to an output string, 
a trivial lower bound for many mappings is the number of steps required 
to read the input string. There are a number of combinatorial techniques, 
involving for example crossing sequences [cf.6, §10,4], which are 
adequate to derive nontrivial lower bounds but only for rudimentary 
machines such as single-tape Turing machines. The powerful diagonali- 
zation techniques are of use only for an input/output mapping sufficiently 
structured to encode machine computations. 

In this paper we expound and develop further the "overlap" argument 
introduced by Cook and Aanderaa [3], which establishes a nonlinear 
lower bound on the time required by a very general class of machines to 
perform multiplication of binary integers. (A similar argument has been 
used again recently by Aanderaa [1].) Our contribution relative to [3] is 
firstly that the main line of proof is somewhat shortened and simplified, 
secondly that the lower bound in [3] is increased by a factor loglogn and 
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is shown to hold for the average rather than just the worst case, and 
thirdly a new observation is made which yields an even stronger result 
for the case of multitape Turing machines. In some cases we show our 
new results to be optimal to within a constant factor by exhibiting 
suitable multiplication algorithms. 

Our main results are lower bounds for "on-line" multiplication. 
A mapping from an input string to an output string is said to be carried 

out on-line if for all n the n output symbol is printed after the n 

st * 

and before the (nfl) input symbol is read . For on-line multiplication 

the multiplicands are given in binary, least significant digit first, 

and each input symbol encodes the two corresponding input digits. We 

may as well assume that, for N-digit arguments, only the least 

significant N digits of the product are to be produced. (The remaining 

digits may be obtained, if desired, by concatenating N zeros to the 

arguments.) On-line multiplication is of course possible, though a 

naive implementation may take time at least proportional to n between 

the (n-1) and n digits, with therefore a time of order N for an 

N-digit product. We show here that the minimum average computation time 



For technical convenience, we use this strong form of the definition which 
prohibits an output from being produced too soon. It is not , a serious 
restriction for two reasons: For binary multiplication, the i digit of 
the product cannot be determined until the i fc " digits of the two inputs 
have been read except when both numbers are even; hence a machine can take 
advantage of the weaker definition for at most a quarter of all possible 
inputs, changing our mean-time bounds by only a constant factor. Secondly 
any BAM may be modified without time loss to obey the strong definition by 
adding a two-headed linear tape to serve as an output buffer. This does 
not affect any of our lower bounds. (Cf. [5].) 



for on-line multiplication is bounded below and above by functions of 

k 
the form N(logN) , where for the lower bounds k is approximately 1 and 

for the upper bounds k is approximately 1 or 2 depending on the class 

considered. The exact results are given in Sections 6,7 and 8. For 

further background and motivation the reader is referred to [3]. 



2. MACHINE MODELS 

In the class of machines to which our proofs apply, we wish to include 
not only the familiar multitape Turing machine but also Turing machines 
with tapes of higher dimension and some suitably tame "random access 
machines' 1 . We shall have to exclude iterative arrays and other machines 
with unlimited parellelism since these are able to do multiplication in 
"real-time" [2]. Our definitions follow [3] fairly closely, with minor 
differences in order to simplify the notation and proof or to take fuller 
advantage of the power of the proof technique. The reader is assumed to 
have experience with the basic definitions and techniques of automata 
theory [6]. 

A bounded activity machine (BAM) has a deterministic finite-state 
control which operates with a one-way read-only input tape, a one-way 
write-only output tape and a storage structure . The storage structure is 
a countable set of locations each of which can hold a binary value. The 
store is accessed and modified by a finite, fixed number of work heads 

whose moves are specified by a finite set of shifts cp , . . . ,cp . For each 

1 P 
i, cp i is a map from the set of locations into itself, and a head at some 

location x may be moved in one step to the location cp. (x) . 
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A complete step of the BAM is described as follows. Depending on the 

state of the finite control, the input tape may be advanced one symbol 

and precisely one work head must be "moved" by one of the shifts. 

Then, depending on the control state, the new input symbol (if any), and 

the value stored in the storage location to which the head is moved, a 

new value may be stored, an output symbol may be given, and a new control 

state is entered. Thus, for each given symbol from the input tape or a storage 

location, there is a unique step at which it is read, and the definition 

prevents it from being reread later. Moreover, only one work tape symbol 

is read per step, so we may speak of "the work symbol read at step s". 



Various restrictions in this definition, such as binary storage, 
one head move per step, and the lack of dependence of the new step on the 
old storage values, are introduced to simplify tshe exposition and cause a time 
penalty of at most a constant factor compared with more versatile machines. 

We shall say that a computation is real-time if it is on-line and each 
input symbol is read a fixed number of steps after the previous input. 
Note that we shall not require the store to be initially "empty" except 
for the special class of "uniform" machines defined below. 

It is easy to design a BAM which can multiply in real time. A 
suitable storage structure is based on an infinite binary tree, traversed 
by a single head which takes left or right branches depending on the input 
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digits. The correct output is either already stored in the location of 
the tree at the start or else is encoded in the structure itself in an 
obvious way. For example the structure may have a transformation i|r such 
that for all x, either i[r(x) = x or else i|r(t( x )) = x and ijr(x) * x. 
Which alternative holds can be determined for any location by a sequence 
of a few steps. 

Two classes introduced by Cook and Aanderaa ([3]) to evade such an 
oracular construction are the polynomial- limited and uniform machines 
defined below. We also add two further classes. 

( i ) Polynomial- limi ted . 

A storage structure is polynomial - limi ted if there are constants 
c,d such that for all locations x and for all t, the number of locations 
accessible from x in t steps is no greater than ct . A BAM with such a 
structure is a polynomial- limited machine . 

(ii) Uniform. 

A storage structure is uniform if for each pair of locations 
x,y, there is a permutation f such that f(x) = y and for each 
shift CD. of the structure, £0©. = cp.°f. A BAM with a uniform 

-structure which is initially "empty" (i.e. each location has the same 
initial value) is a uniform machine . The reader is referred to [3] for 
further discussion of these and other classes. 

With a suitable form of definition, Turing machines, even with 
multiple heads and multidimensional tapes, satisfy both restrictions. 
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Our main result holds for machines satisfying either restriction. The 
BAM described above which multiplies in real-time satisfies neither. 

(iii) One-dimensional multihead multitape Turing machines. 

We can obtain a stronger result than for classes (i) and (ii) if we 
restrict the tapes to be linear, i.e. one - dimensional. 

(iv) Oblivious machines. 

For this class we turn our attention from the storage structure, 
which may be arbitrary, to the form of the finite state control or 
"program". The (single) storage location accessed at each step defines 
the storage sequence for any computation, and this depends in general 
on the input. A machine is oblivious if for input sequences of a given 
length the storage sequence is fixed, i.e. independent of the input 
symbols. Naturally, the control state and the values inscribed in the 
store can, and in general do, depend on the input symbols; it is just 
the movement of heads which is invariant. Our interest in oblivious 
machines is two-fold. Firstly the restriction permits a very simple 
proof of an improved lower bound, and secondly it 
happens that almost all the algorithms proposed or used for multiplication 
are oblivious or can be made oblivious at the cost of only a constant 
factor in time. 

In section 7 we shall retrospectively consider other classes of 
machines to which the proof techniques applies. 



3. RETRORSE FUNCTIONS 

Informally, a function from an input string to the output is 
retrorse if the output values in any. segment depend very heavily on the 
input values of the immediately preceding segment, and so the function 
evaluator needs to n turn back" to the previous input segment. On-line 
multiplication will be shown to be very retrorse. 

We define K = E 2 , so the binary expansion of K has a "1" in 

iN 2 < N N 

position i iff i is a power of two, where the positions are numbered 

starting with at the right (lower-order) end. The usefulness of K 

N 

is that multiplication of N-digit numbers by K is extremely retrorse, 
and the main proof is simpler than for two-input multiplication. Our 
first theorem provides a lower bound on the average time for on-line 
multiplication of an N-digit integer by K and hence also on the worst- 
case time for on-line multiplication. In the second theorem we show 
that the same bound holds for the average time for general on-line 
multiplication. 

Figure 1 represents the multiplication of K by an N-digit number X 
with result Z. It is drawn in the conventional way with least significance 
to the right. R and M are non- negative integers, and we shall always take 
R to be a power of two, 2 . W represents the subfield of Z consisting of bits 
^Stt-R-l* * '^Wl^' and Y re P resents the subfield of Z consisting of bits 

Z M+2R-1* * #Z M+R-KL Z M+R* We Wil1 at times think of w and Y as R-bit integers 



* We choose not to follow Cook and Aanderaa in their choice of "complex 1 ' 
to describe these functions. 
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R 
in the range to 2 -i, To say that W assumes a value i means that we 

imagine placing the binary representation of i into the W subfield of X. 

This in turn causes Z to change, for Z always means the product K *X, 

and that in turn affects the value of Y. We investigate the dependence 

in this way of Y upon W . 

The way in which Y varies with W of course depends on the remaining 

bits of X, other than those in W, which we denote by X\W. As a number, 

X\W is the value of the binary string obtained by setting the W-field 

of X to zero. 

For some particular fixed value of X\W, let W range through all 

R MR 

possible values 0, 1,..., 2 -1, so X. = X\W + i-2 , 0*1*2-1. 

Let Z , Z ,... and Y , Y , ... be the corresponding values of Z and Y, 

that is, Z. - K -X., and Y. is the Y- field of Z.. 
i N i i i 

If i < j , then 

Z . _ Z . = (X . -X.).K N = Q-i).2 M . V 

2R — _ 

Since K = K_ +2 «K for some integer K, we have 

M M4-2R 

Zj - Z. = (j-i).2 M .K 2R (mod 2 M+ ^ K ). 



Now suppose Y. = Y.. Then 

„ _ „M M+2R 

Z - Z ± = a»2 ( m od 2 ) 

for some integer a, | a| < 2 , and hence 
(j-i)-K 2R ^ a (mod 2 2R ). 
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Since R = 2 r , 

2(2 K -1) ;> K 2R = 2 Z + 2 l + ... + l 2 - t, 2 R . 

By the right hand inequality, (J-i)»K 2 £ 2 R > a, and hence for the 
congruence to hold, we must have 

(J-i)'K 2R * a+2 2R > 2 2R - 2 R . 

From the left-hand inequality for K 0r) , 

^R 

• • ^ *.9 R 

Hence, for any i, there is at most one j > i such that Y. = Y , and we 

1 j 

have proved: 

Lemma 1 , For fixed values of M, R, N and XVW, each value of Y can 
arise from at most two values of W. 



4. OVERIAP 

This concept is the basis for a very elegant counting argument 
introduced in [3], It has recently been put to use again by Aanderaa [1]. 
The motivation for its definition comes from the computation of very 
retrorse functions. The obvious way in which information about a previous 
input segment can be obtained is by revisiting locations which were visited 
when that segment was being read. Overlap is defined in terms only of the 
storage sequence defined previously. If two successive accesses to the same 
location I occur at steps s and s 2 (s < s ), then the pair (s-, s ) 
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ls called an overlap pair , & is called the overlap location of s^, and 
the value stored in A at step s and referenced at step s„ is called 
the overlap value of s . The total overlap fi = J {(s ,s ) | (s ,s ) is 

an overlap pair} | . Clearly the total time T £ Q, since each step s is 
the second component of one or zero overlap pairs depending on whether 
the location accessed at step s has been accessed before or not. 

Let C , C be disjoint contiguous time intervals during a computation* 
We define overlap (C , C ) to be the number of overlap pairs (s , s ) for 
which s € C and s G C . 

Without loss of results we assume N = 2 . For any i = 0,,..,n, which 
we call the level , define R. = 2 1 , and if S = S i S N _o* ' * S i S is any 
string of length N, we partition S into contiguous blocks S. ~n-i -, 

•••» s i,r s i,o of length V where s i,j = ^j+DR.-r-^j-R., 

£ j £ 2 " 1 -1. If X is the input string, the time interval C. . starts 
as the rightmost digit of X. . is read and continues until the rightmost 
digit of X. is about to be read (or until the computation ends if 
there is no such j+1). 

Let t . be the length of time of C. ... Clearly the total time of the 
computation T = £ t. . for each level i. We define w. . = overlap 

(C. ., C. .,-) for all suitable i.j, and also 
v i,j' i,j+l ,JJ 

1 j i.2j 
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for any i. 

Lemma 2. Total overlap Q = £ w. 

t i 

Proof. Let (s-,s ) be an overlap pair and let i be the least level 

such that s,,s_ € C. . for some j. C. . is the concatenation of the two 
1 2 i,j i,j 

intervals C. _ _. and C. - .,, at the next lower level. By our choice of 
l-l, 2j i-l,2j+l 

i, s- € C. - . and s f C, . , n , so (s^sj contributes to 
1 l-l, 2 j 2 i-I, 2 j +1 12 

w. - . and hence to w. _ . Suppose it contribues to w Then 
i-I, 2j i-I rr i l ,2j f . 

i ! £ i-1, for s and s belong to the same interval for each level above 

i-1. i f 2: i-1 since if (s-,s ) contributes to w. , fJ then both 

s- and s are in the same block C. f . ., at level i'+l. Hence i 1 = i-1, 
1 2. i +1 , j 

and it is clear that j 1 = j. We conclude that each overlap pair contri- 
butes exactly once to exactly one u. and hence exactly once to S w .. U 



5. COMPUTATIONS WITH SMALL OVERLAP 

We consider an on-line computation of some machine TO from input X 
to output Z. As before, M and R are fixed numbers, W is the length R 
subword of X, X > rrD_ 1 • • • X M » and Y is the length R subword of Z, 
Z M+2R-1 ' " * Z M4-R* Throughout this section, X\W, M and R remain fixed, and 

we explore how the value of Y changes as the value of W is varied. Unlike 
the previous sections, Z now represents the output of TO on input X. 
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Giving a particular value to W completely determines the computation 

of 2R. Let s be the step which advances the input head onto the first 

symbol of W, let s be the step which moves the input head off of the 

last symbol of W, and let s be the step which reads the next input symbol 

after producing Y. Define interval C to be the steps from s to Sj-1, 

C the steps from s to s -1, and let t and t be the lengths of time 

associated with C and C__ , respectively. T is the total time of the 
W Y 

computation, and we let w = overlap(C ,C ). This notation is illustrated 
in Figure 2. 

As W varies, so do u>, t , t T and Y. Let Q(to, t, $) be the total 

Y W 

A A 

number of different Y values yielded by those W such that a) < w, t ^ t, 

A AAA 

and T £ T. If 3Jt is computing a retrorse function, Q(w, t, T) must be 
large, and we will use this fact to deduce the constraints on 
w, t» and T that eventually lead to our lower bound on T. 

Our upper bounds on Q depend on the kind of machine, but all are 
obtained using the same general method. For a given value of W, we observe 
the computation during the interval C and we record in a suitable 
way information about W that affects the computation in C y , such as the 
state of the control and the positions of the heads at step s- (the 

beginning of C y ) , the steps during C y at which overlap with C w occurs 

(W- overlap steps ), and the overlap value for each W-overlap step. For 
each class of machines, enough information will be recorded to ensure the 
validity of: 
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Condition 1 . Let w and w 1 be two values for W and y and y ! be the 
corresponding values of Y. If the information recorded for w and w f is 
the same, then y = y'. 

It follows immediately from Condition 1 that the number of different 
possible information records obtained from values of W for which 
w, t Y , and T are bounded respectively by to, t, and T is an upper bound 

AAA 

on Q(to, t, T), 

Lemma 3 . There exists a constant C depending only onl such that, 
for f > 1 and .A £ t^4: /A . 
(a) Q(w, t, I) £ T C .2 W -( a Jif K i s polynomial-limited or uniform; 



A * ' 

AAA hQ Q3 

(b) Q(w, t, T) <: T »2 if SK is a one-dimensional multihead multitape 

Turing machine; 

A 

(c) Q(ai, t, t) ^ C-2 W if SOT is oblivious. 

Proof. 

(fl) Case i : 2JI is polynomial-limited. The information record consists 
of the state of the control, the position of each head at time s , a subset 
6 = {t 1 ,...,t^} of the integers from through t-1, and a binary sequence v 
of length u. 8 is chosen to include the times, relative to S-» of all the 
W-overlap steps. The i bit of v equals the symbol referenced at time 
S l + t i ,which wil1 be the overlap value if s + t. is a W-overlap step. 
This insures that Condition 1 is satisfied. 

The total number of such records is clearly at most c*2 W -[ M*H C , 
where H is the number of possible positions for each head at step s . 

A A A A 

Since TO is polynomial-limited, H £ t ^ T £ T , yielding the bound 
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of part (a) . 

(a) Case 2: [0! is uniform. This case is exactly like Case 1 except 
that we do not record the actual head position at time s^ for the number 
of possible positions is too large. Rather, we record for each head h the 
step of C y (if any) at which h first visits a square I visited prior to 
step s Q together with the time of the first visit to I (which uniquely 
specifies A). Call such a location I filled. (In the case of more 
than one head, a small amount of additional information must be recorded 
to account for the possible interactions among the heads before revisiting 
a filled location. This argument is presented in more detail in [3].) 

If the head h never visits a filled location during C , then 
because of uniformity the symbols read and written by the head can be 
uniquely determined from the overlap steps and values, without knowing 
the position of h at time s v On the other hand, if h does visit a 
filled location b, then the step of C y at which I is visited together 
with I itself uniquely determine the position of the head at step s , 
again by the uniformity condition. I can be specified by the time of 

A 9 

its first visit, so there are at most t «T <; T different starting 
positions of a single head h which must be distinguished. 

(b) TO is a one-dimensional multihead multitape Turing machine. 
TO is a special case of a polynomial-limited machine, so we may record the 
state and starting head positions as in (a), Case 1. However, the positions at 
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which overlap will occur may be specified much more succinctly, for 

the squares visited during C by each head form an interval. Thus, only 

w 

the endpoints need be named. Since at most 2T locations on a linear tape 

2 
can be reached in T steps by a given head, there are at most2T possible 

intervals per head. As before, a binary sequence of length oo is sufficient 

in which to record the overlap values. Thus, the total number of such 

c A 2 c oo 
records is at most c-H '(2T ) *2 , where H is as in (a), Case 1. 

(c) W is oblivious. The positions of the heads at each step are 
independent of W, so only the state of the control at step s and the 
values of the overlap locations need to be recorded, giving the bound 

A 

C-2 . Q 

We finish our preparations for the main proof with a combinatorial 
lemma. By way of motivation, let !^be a machine that multiplies on-line 

A A A 

and let w, t, and T be bounds on w, t , and T respectively such that 

* A A 3R/4 3R/4 

Q(w, t, T) < 2 . By Lemma 1, all but 2*2 ' values of W, a vanishing 

R 
fraction of the 2 values, cause one of the three bounds to be exceeded. 



This gives an implicit lower bound on to in terms of t and T which 
says in effect that if T is small, then the total overlap Q is large, 
which implies that T is large. Hence, T must be large on the average. 

Lemma 4 . Let C, R, a be positive constants such that log a > 2(C + 3). 

If < a) <; t and to + t/a + log T ^ R/(2 log a), then T C .2 W ./M< 2 3R/4 . 
(All logarithms in this paper are taken to base 2.) 
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Proof . For any p £ q > 0, 



(:)**« (?) 



by Stirling's formula. Assume the hypothesis, so w < R/(2 log a), 
t < aR/(2 log a), and log T < R/(2 log a). 



C ^ 
T -2 ■ 



< * ft) 



which is monotonic increasing in w, t and T since <*) £ t. So 

t C . f 2t ) < 9 CR/ ( 2 lo S a)/aR/log a \ 
\u J Z ^R/(2 log a) j 



< 2 CR/(2 log a) #(2ae) R/(2 log a) 

< z 3R/4 . D 



6. MAIN RESULTS AND PROOFS 

Theorem 1 . There is a constant C such that for any BAM 2R which for all 
N multiplies N-digit numbers by K on-line, the mean time T(N) over all 
numbers of length N satisfies the following bounds for all sufficiently 
large N. 

(i) If TO is polynomial- limited or uniform, 
T(N) > CN log N/log log N; 
(ii) If H is a one-dimensional multihead multitape Turing machine 
or is oblivious, 

T(N) > CN log N. 
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Proof. Suppose first that M is polynomial limited or uniform. We 
may assume that log log N > 2(C + 3) where C is as in Lemma 3, and define 
a = log N. We consider again the situation depicted in Figures 1 and 2, 
where X^W is fixed and W is allowed to vary. Applying Lemmas 1, 3(a) and 4, 
we deduce that the number of distinct W's for which 
u + t/a + log T £ R/(2 log a) is less than 2«2 3R/4 . Hence certainly, 

mean (u + t/a + log T) > R/(3 log a) for R & 16. 

W 

Since this inequality holds for all values of X^W, 

mean (u + t/a + log T) > R/(3 log a) 
X 

where the mean is taken over all N-digit numbers. If we assume that 

2 

mean T < N then 

X 

mean log T £ log' mean T < 21og N 
X x 

since the geometric mean is less than or equal to the arithmetic mean. 

Now use the identity: mean(A + B) = mean(A) + mean(B). Therefore 

mean (w + t/a) > R/(3 log a) - 2 log N 
X 

> R/(4 log a) 

provided that R > 24* (logN) • (log logN). 

Now we suppose W = X and Y = Z. for some i.j, so that 

i> 2 3 i,2j+l ' J ' 

" = W i,2j' t = t i,2j+l and R = R i = 2% - Takin S the intervals in pairs by 
summing over j, the previous inequality gives 

T "i + T ? S i.2J+l ' ' ' f ""/ ( "i,2] + 'i.2 j+ l ' a > 

> E R, /(4 log a) = N/(8 log a). 

j 
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If we assume that mean T < N log N/(16 loglog N) , then since £ t < T 

X . i,2j+l 

we have mean w ± > N/(8 log a) - N/(16 loglog N) 

X 

= N/(16 loglog N). 

Since this inequality holds for all i such that i < log N and 

2 = R. > 24- (log N)« (loglog N), we conclude that 

mean T > mean fi = L mean w 
X X i X l 

> [logN-log(24(logN)(loglogN))]'N/(16 loglogN) 
S N'logN/(l7 loglog N) 
provided N is sufficiently large. Thus, we have proved case (i). 

The proof for case (ii) is somewhat simpler. We can easily show that 
if u + 2C.log T £ R/2 then T C .2 U < 2 3R/ \ From this we deduce in a similar 
way to case (i) that 

mean (u + 2C»log T) > R/3. 
X 

If mean T < N 2 and R ± > 48-C-log N then mean w. > R./4 and mean co. > N/8, 



SO 



mean T > mean Q = £ mean w. > n»(1oh N) /9 
y y i X *- 
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This proves case(ii). Of course a proof solely for oblivious machines 
would be very easy since to.,, t.., T, etc. are independent of X. □ 

We can immediately extend this proof, removing the dependence on K . 
We show that nearly all numbers as multipliers yield a function nearly 
as retrorse as multiplication by K . In the situation of Figure 1 and 

•k 

Lemma 1, let us replace K by an arbitrary N-digit number K . 

Lemma 5 . For any h, < h < 2 , if for some i, Y. = Y « , then 

K must have one of at most 2 possible values. 
2R 

Proof . As in the proof of Lemma 1, if Y. = ^-.lv,' then 

* 2R 

h*K 2R = a (mod 2 ) 

I 1 o R 

for some a, | a| < 2 . 

Let d = gcd(h, 2 ). Then d|a, so a = kd, where [ kd j < 2 . 

R-l R 

Also, d j 2 by definition of d and the fact that h < 2 . Hence, 



k e 



/ 2 R 2 R 2 R 1 

i" d~ +1 > " d" + 2 "--» - 1 ' °> x »---»d" " V 



so there are (2*2 /d) - 1 such k's. 

By elementary number theory, there are exactly d values of K 



2R 



* 2R 

in the range £ K < 2 which satisfy 
ZK 

* 2T? 

hK s kd (mod 2 K ). 

ZK 

Hence, there are at most 

d . ( 2^. 1)<2 R+l 



values of K OT , for which Y = Y. lt _. ri 
2R i l+h U 
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From this lemma, it follows at once that at least half of all possible 

no 

2R 



* R-2 

values of K have the property that for all h, < h <; 2 , and for all i, 



Y i ^ Y i+h" Hence 5 for these values of k' , at most 4 different W's yield 
the same Y. Therefore the proof of Theorem 1 can be followed very closely 
except that "mean" is replaced throughout by "mean mean". Thus we have 



X * x 

K N 



shown: 



Theorem 2 . There is a constant c such that for any BAM 2J? which 
performs on-line multiplication, the mean time, T(N), for pairs of 
N-digit numbers satisfies the following bounds for all sufficiently 
large N. 

(i) If SHI is polynomial-limited or uniform, 

T(N) > c N log N/log log N; 
(ii) If SOU is a multitape Turing machine or is oblivious, 

T(N) > c N log N. 
We know of no direct implication between Theorems 1 and 2. 



7. EXTENSIONS 

In this section we shall outline some of the ways in which the 
classes already considered can be extended while remaining susceptible 
to the same proof methods. 

A simple "random-access" machine could be modelled by a BAM with 
a storage structure based upon some sort of binary tree, so that 
location are "addressed" by binary sequences and accessed in time 
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proportional to the address length. Such a structure is of course 
exponentially, rather than polynomially, limited. However we recall 
that the latter property is used in the proof only to allow head positions 
to be specified just before a new input symbol is to be read. The proof 
goes through just as before therefore, provided that the tree structure 
is used in such a way that the heads are returned to the root before 
each input is read, for example if all the "random-accessing" is 
accomplished by a subroutine. 

An alternative approach to a random-access store is the structure 
based on the free group on two generators, a,b, with the four 
shifts being left multiplication by a, a" 1 , b, b" 1 . This can be 
operated as a quite serviceable random-access store and is of course 
uniform. 

A point of merely technical interest is that the same bounds may 

be easily proved when the polynomial limited class is extended by 

d t G 

replacing lf ct " in the definition by "c2 " for any e < 1. 

Unfortunately we know of no natural class of machines which takes 

advantage of this extension. 

Finally we show that without impairing the proof for any of the four 

classes of machines we may add "oracles", and indeed more, in the 

following way. The BAMs are extended by allowing an infinite number of 

states in the control subject only to the restriction that just a finite 

number of them may read the input tape. An "oracle", which may even be 

non-recursive, could be invoked with such a machine to read a sequence 
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of storage locations and put the result of applying its oracular function 
in some other sequence of locations. This would take just the number of 
steps required to access the locations. The proofs are unaffected by 
this relaxation. A simple example which emphasizes the importance to 
our proof of the on-line restriction is a multitape Turing machine with 
an oracle to perform (off-line) multiplication in linear time. 
The cN-logN lower bound applies even to this machine. 

8. UPPER BOUNDS 

An important technique for establishing upper bounds for on-line 
multiplication is given by M. Fischer and Stockmeyer [4]. Their 
construction shows that, for a wide range of machine classes including 
multitape Turing machines, oracle Turing machines, and oblivious machines, 
given any off-line (i.e. unrestricted) multiplication machine with time 
complexity T(N), where T satisfies T(2N) & 2T(N), an on-line machine can be 
produced with time complexity no greater than cT(N).log N. 

A slight extension of their methods shows that on-line multiplication 
of N-digit integers, where one of the numbers has at most logN "l"-digits, 
can.be performed in time O(N-logN). In particular, there is a Turing machine 
for on-line multiplication by K^ with complexity 0(N- logN), matching the 
bound of Theorem 1 (ii) . 

General on-line multiplication algorithms may be obtained by applying 
the Fischer-Stockmeyer result to the off-line algorithm of Schonhage- 
Strassen [7], which on a Turing machine has complexity 0(N- logN- loglogN) . 
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With the facility of constructing and rapidly accessing a multiplication 
table for logN digit numbers such as is provided by a random-access 
machine, the Schonhage-Strassen off-line multiplication algorithm can be 
performed in time O(N-logN). 



In Figure 3 we set out some of the upper bounds derived from the 
above results for three classes of machines. Constant factors are 
omitted and underlining denotes that a lower bound of the same order has 
been demonstrated in previous sections. The first class is multitape 
Turing machines with one-dimensional tapes; the second is BAMs with an 
infinite number of states under the restriction on input states given in 
Section 7; the third class is either version of "random-access" machine 
described in Section 7. With the uniform structure based on the free 
group on two generators, it is easy to simulate Turing machines and stores 
with "random-access". BAMs with the binary tree structure and the 
restriction on head positions given in Section 7 are also sufficiently 
powerful to allow a fast implementation of the required algorithms, 
though the programming techniques needed are less straightforward. 

9. CONCLUSION 

In this paper we have described a powerful counting argument based 
on the notion of "overlap" and have investigated the extent and limitations 
of its applicability. Overlap arguments are applicable only under the 



*&5« 



on- line restriction, but in many cases they earn lead to complexity bounds 
which are optimal within a constant factor. 

An important objective for future research is to obtain nMfc$ 
lower bounds without the severe restriction to on-line computation. 



Such results, even for oblivieue w rtttnss mr i riiiti tm>l hiiisrt if I h u U u , would 
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